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Summary 

Diagnostic trials evaluating a single marker or comparing two markers often employ an 
arbitrary sampling ratio between the case and the control groups. Such a ratio is not 
always an efficient choice when the goal is to maximize the power or to minimize the 
total required sample size. Instead, optimal sampling ratios, discussed by Janes and Pepe 
(2006), offer a better alternative for one-marker trials. In this paper we focus on com- 
parative diagnostic trials which are frequently employed to compare two markers with 
continuous or ordinal results. We derive explicit expressions for the optimal sampling 
ratio based on a common variance structure shared by many existing summary statis- 
tics of the receiver operating characteristic (ROC) curve. Estimating the optimal ratio 
requires either pilot data or parametric model assumptions ; however, pilot data are often 
unavailable at the planning stage of diagnostic trials. In the absence of pilot data, some 
distributions have to be assumed for carrying out the calculation. An optimal ratio from 
an incorrect distributional assumption may lead to an underpowered study. We propose 
a two-stage procedure to adaptively estimate the optimal ratio in comparative diagnostic 
trials without pilot data or assuming parametric distributions. We illustrate the properties 
of the proposed method through theoretical proofs and extensive simulation studies. We 
use an example in cancer diagnostic studies to illustrate the application of our method. 
We find that our method increases the power, or reduces the required overall sample size 
dramatically. 

Keywords: AUC; Diagnostic accuracy; Internal pilot data; Two-stage design 
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1. Introduction 

Diagnostic trials estimate the diagnostic accuracy of a marker or compare the diagnos- 
tic accuracy of two markers. For example, in a diagnostic trial by Hendrick and others 
(2008), investigators compared the accuracy of digital mammography with screen-film 
mammography. Pepe and others (2001) refer to these trials as phrase III diagnostic tri- 
als. In these trials, the true disease status of subjects is known. To evaluate the diagnos- 
tic accuracy of a binary marker, sensitivity and specificity are used. Sensitivity is the 
probability of having a positive test result for a case subject. Specificity is the probabil- 
ity of having a negative test result for a control subject. The false positive rate (FPR) 
is 1— specificity. For continuous markers, we obtain sensitivity and false positive rate 
(FPR) based on a threshold that distinguishes the test result as being positive or neg- 
ative. A varying threshold allows a number of sensitivities and FPRs to be computed 
simultaneously. The receiver operating characteristic (ROC) curve is a plot of sensitivity 
versus FPR for all possible thresholds. 

Typically the ratio between the number of cases versus the number of controls is fixed 
in advance. Most diagnostic trials apply an equal case-control ratio; for example, a lung 
cancer prevention trial recruited 71 prostate cancer cases and 71 age-matched controls 
without cancer (Etzioni and others, 2003). A diagnostic study in Hendrick and others 
(2008) compared the accuracy of digital mammography with screen-film mammogra- 
phy using equal numbers of breast cancer patients and controls. In a colorectal cancer- 
screening study, about the same number of colorectal cancer patients and non-cancer 
subjects were used to identify markers (Janes and others, 2005). The equal ratio, how- 
ever, may not be optimal in maximizing the test power or minimizing the total required 
sample size. A procedure proposed by Janes and Pepe (2006) estimates the optimal ra- 
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tio for evaluating a continuous marker. The ratio is optimal with regard to minimizing 
the variance, or maximizing the power for a fixed total required sample size. Equiva- 
lently, the optimal ratio minimizes the total required sample size with a fixed power. 
To the best of our knowledge, their method is the first attempt to identify the optimal 
sampling ratio in diagnostic trials. However, since the optimal ratio is derived using the 
first derivative of the ROC curve, their method cannot be used for ordinal data which 
often occur in medical imaging studies. More importantly, pilot data are required to 
estimate the optimal ratio. In the absence of pilot data, some distributions have to be as- 
sumed for carrying out the calculation. An optimal ratio from an incorrect distributional 
assumption may lead to an underpowered study. In addition, optimal ratios for compar- 
ative diagnostic trials are of interest to investigators, but have not been discussed in the 
literature. 

In this paper we derive the optimal sampling ratio of cases to controls in compara- 
tive diagnostic trials. The proposed optimal ratio is based on a common variance struc- 
ture shared among existing ROC summary statistics. Special cases of these statistics 
include the nonparametric area under the ROC curve (AUC) statistic proposed by De- 
Long and others (1988) and the weighted AUC statistic by Wieand and others (1989). 
These statistics have been applied in the sequential diagnostic trial design by Mazum- 
dar and Liu (2003) and Liu and others (2008). The calculation of the optimal sampling 
ratio requires either parametric model assumptions or pilot data. When the parametric 
model is incorrectly specified, the resulting ratio may not give the optimal power or the 
minimal required sample size. It is desirable to re-calculate the optimal ratio when data 
become available during the trial. We propose a two-stage method to incorporate the 
idea of internal pilot data, reviewed in Proschan (2004). We assume a parametric model 
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at the beginning of the trial to obtain the initial optimal ratio. This ratio is used to sample 
the cases and controls at the first stage. When sufficient observations are available, the 
optimal ratio is re-calculated at the second stage, and the numbers of cases and controls 
are adjusted accordingly. We show that although the optimal ratio is updated during a 
diagnostic trial, the analysis at the end of the trial can be carried out in the same fashion 
as in the traditional trial without affecting the nominal type I error rate. 

The paper is organized as follows. In Section 2, we start with the optimal ratio for 
comparative diagnostic trials based on common ROC statistics. We then present the ex- 
plicit expressions of the optimal ratios for comparing AUCs and for comparing weighted 
AUCs. In Section 3, we propose a two-stage procedure to adaptively estimate the opti- 
mal sampling ratio using the internal pilot data. We illustrate the power increase and the 
savings on the overall required sample size using the proposed method through a cancer 
example in Section 4. Section 5 investigates the small sample performance of the pro- 
posed procedure in maintaining the nominal type I error rate and increasing the power. 
Some discussion is presented in Section 6. 

2. Optimal sampling ratio 

Suppose we have N subjects with m cases and n controls. Each subject is measured by 
diagnostic test i (£ — 1, 2). We define the ith case as X«, where i = 1, . . . , m, and the 
jth control as Y ej , where j = 1, . . . , n. The joint cumulative survival functions for cases 
are (X U) X 2 j) ~ S d (x 1: x 2 ) and the joint cumulative survival functions for controls are 
(Yij,Y 2 j) ~ Sj(?/i,?/2)- Their marginal survival distributions are X ei ~ Sd,e(x) and 
Y(j ~ S^i(y) respectively. For the threshold c varying in (— oo, +oo), the sensitivity is 
S d/ ( c ) = Pr(X ei > c), and the FPR is S d/ (c) = Pr(Y ej > c). Subsequently, the ROC 
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curve for test i is defined as Re(u) = S d ^(S^(u)), where the FPR, u, falls within [0, 1]. 

Summary measures for a single ROC curve include the area under the ROC curve 
(AUC), the partial AUC (pAUC), and the weighted AUC (wAUC). The AUC gives the 
probability that a measurement randomly selected from the case group is greater than 
the measurement randomly selected from the control group (B amber, 1975; Hanley and 
McNeil, 1982); that is, Pr(X > Y) = J* S d {S d \u)}du. The wAUC by Wieand and 
others (1989) is given by 

Q= f S d {S^(u)}dW(u), (2.1) 

JO 

where W(u) is a probability measure. We let W(u) be a point u , a FPR, to calculate 
the sensitivity of a test, or W(u) = u, where u e (0, 1), to estimate the AUC. When 
W(u) — (u — u )/(ui — u ), where u e (uq,ui), (2.1) gives the partial AUC. 

The statistics for comparing markers might be parametric, e.g., the binomial model 
of Dorfman and Alf (1969), semiparametric (Zou and others, 1997; Tang and Zhou, 
2009), or nonparametric (Mazumdar and Liu, 2003; DeLong and others, 1988; Han- 
ley and McNeil, 1983; Wieand and others, 1989). Let be the parameter in the ROC 
comparison, and be the estimator. Based on the variance expressions for these ROC 
statistics, we identify the following common structure for the variance of all these ROC 
statistics when the sample sizes get large: 

var{0) = V ^ + V JL, (2.2) 
m n 

where v x is the variance associated with measurements of case patients and v y is the 
variance related to control patients. In this paper we use the nonparametric statistics 
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by DeLong and others (1988) and Wieand and others (1989). We present the variance 
expressions for these statistics in Section 2.1 and 2.2. One may refer to other aforemen- 
tioned articles for the same variance structure of parametric and semiparametric ROC 
statistics. 

Given the variance structure in (2.2), the total required sample size in a diagnostic 
trial can be minimized using an optimal sampling ratio when the variance is fixed. In 
other words, the power for comparing two markers can be maximized using this optimal 
sampling ratio. Suppose the total required sample size in the diagnostic trials is N = 
m + n, the sampling ratio is r = m/n. Let the variance of 9 is a fixed constant, a. Since 
m = rn = Nr/ (1 + r), it follows that 

1 + r 

v x /m + Vy/n = —jy—(v x /r + v y ) = a. 
The total required sample size can then be expressed as 

7V= _±I( Ux / r + Uy ). 

CI 

To minimize N, we take first derivative with respect to r and equate it to zero. We obtain 
the following equation: 

Vy/a — v x /ar~ 2 = 0. 
By solving the equation above, the optimal sampling ratio is obtained as 
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The optimal sampling ratio is analogous to the Neyman allocation ratio for clinical trials 
which has been widely used to save the overall sample size for a fixed power. Interested 
readers can refer to Jennison and Turnbull (2000) and Rosenberger and Lachin (2002). 

2.1 Optimal sampling ratio for comparing two continuous markers 

The difference between two wAUCs, A = — f2 2 , is used in Wieand and others 
(1989) to compare the wAUCs for continuous data. Here the estimator Cl e of fi^, for 
I = 1,2, is obtained by substituting the empirical function estimators in (2.1). The 
resulting A-statistic is given by A = Cli — Cl 2 . Let wi be J^[Sd,i(S^(u)) — I{X U ^ 



S- d \{u)) - S d , 2 (S^(u)) + I(X 2i ^ S^(u))]dW(u), and let Vj be £ {R' 1 (u)[I(Y lj ^ 



Since Wi's are i.i.d. random variables corresponding to measurements of case patients 
and v/s are also i.i.d. random variables related to measurements of control subjects, 
(2.3) gives the optimal ratio for comparing the difference between wAUCs: 



5jJ(w)) - u] - R' 2 (u)[I(Y 2j «C S'jJ(w)) -u]}dW(u), Tang and others (2008) further 
study the A-statistic and show that for large sample sizes, A is asymptotically equivalent 



to 




(2.4) 




(2.5) 
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where var(wi) is given by the following expression: 



varlw 



1 2 N 

S^{S^(s)}dW(s) 



) = E // ^{^( S Ai)}^( S )^(t)- 



and var(vj) is given by the following expression: 



variv 



/ / it^(s)ii^(£)(s A t)dW(s)dW(t) - 
Jo Jo 

2 ff R'^R'^S^Ks), S£(t)} - st]dW(s)dW(t), 
Jo Jo 



1 2" 

r t (s)sdW(s) 



with the derivative of ROC e (u), R' e (u) = S' de {S^ e (u)}/S' 3/ {S^(u)}. 

Since A compares AUCs, partial AUCs or sensitivities at a particular FPR, we discuss 
the optimal ratios for these special cases by specifying corresponding weight functions. 
When we let the weight function be W(u) = u, for < u < 1, A compares the 
AUCs. The optimal ratio in (2.5) implies that the following ratio between the case and 
the control maximizes the power for comparing the AUCs: 



where v£ and v£ have the following expressions as shown in the Appendix: 



{E[I(X a > Y tj )I(X ei > Y a )\ - [E(I(X ti > Y ej ))] 2 ) 




-2(E[I(X H > Y lj )I(X 2l > Y 2l )]-E[I(X li > Y ll )]E[I(X 2t > Y 2l )\) , 
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and 



2 

.A 



J2 (E[I(X ei > Y ej )I(X ek > Y ej )\ - [E(I(X B > Ycj))] 2 ) 



e=i 



-2(E[I(X li >Y lj )I(X 2k >Y 2j )]-E[I(X li >Y lj )]E[I(X 2k >Y 2j )]) 
The optimal ratio for evaluating one marker, say marker 1, is simply 



E[I(X U > Y^IiXu > Y u )} - [E(I(X lt > Y^)) 



E[I(X U > Y l3 )I{X lk > Y l3 )} - [E(I(X U > Y l3 W 
Janes and Pepe (2006) derive this ratio in terms of placement values as 



Var(S dA (X u )Y 



When W{u) = I{u = u }, where < u < 1, the A-statistic compares the sensitiv- 
ities at the FPR u . The optimal ratio in (2.5) reduces to 



Eti(^M}-[^M}] 2 }-2A 



where 

A = Pr(X u > G^(u ),X 2i > G^iuo)) - R 1 {uo)R 2 {uo) 

and 

B = R'^R'MlPriXu > G^(u ),X 2i > G^ 1 ^)) - u 2 }. 

The optimal ratio for evaluating marker 1 at the FPR u is reduced to the ratio derived 
in Janes and Pepe (2006) as a/-Ri( m o)(1 ~ Ri(u ))/[u (l — u )]/ R'^uq). 
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2.2 Optimal sampling ratio for comparing two ordinal markers 

The variance of the A-statistic involves the first derivatives of the ROC curves. The 
optimal ratio in (2.5) cannot be readily applied to the ordinal data which often occur 
in radiology. In addition, the A-statistic does not allow for ties in marker observations. 
We thus consider the nonparametric statistic by DeLong and others (1988) to obtain 
the optimal ratio for comparing two ordinal markers which are usually two imaging 
modalities in radiology. DeLong 's statistic estimates P(Xu > Yij) — P(X 2i > Y 2j ) + 
[P(Xu = Y ±j ) - P(X 2i = Y 2j )}/2, and is given as: 

n m 

A D = —^Y l MXii,Y 1 j)-rl>(X x ,Y 2j )], 
mn z — ' z — ' 

3=1 i=l 

where ^(Xa, Yej) = 1, for Ygj < X&; 1/2 for Y^ = Xa; and for Y^- > X&, for marker 
1,1= 1, 2. Let ttf be P(X a > Y ej ) + P(X ei = Y tj )/2 for marker £, and Clf be its 
estimator. DeLong and others (1988) show that the large sample variance of A D has the 
form of var(A D ) = v® /m + /n, with 

fry 77, 77, 

v ? =^h\ E { \\ E ^ Y *) - + \\ E *( x *> ^) - ^i 2 
i=i 3=1 3=1 

- 2[-£>(x li ,y ni ) - h*][-Y,i>{X2i,Y 2j ) — h 2 ] I , 



n £ — ' n 
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and 

1 n 1 m m 

< =—[ E {[- E ^*«> ^) - + E ^) - 

jr' = l 1=1 1=1 

i=l i=l 

Therefore, it follows from (2.3) that the ratio, r* D = ^Jv^/v®, maximizes the power for 
comparing two ordinal markers. 

3. A TWO-STAGE PROCEDURE TO OBTAIN THE OPTIMAL RATIO 

One may assume a parametric model to obtain the variances and resulting optimal ra- 
tios derived in the preceding section. When a parametric model is correctly specified, 
the optimal ratio can be calculated from (2.3) for comparing ROC summary measures, 
and the sample size to obtain a specified power can be subsequently derived. However, 
if the parametric model is mis-specified, the calculated sample size may not give the 
appropriate power. We calculated the optimal ratios for comparing the AUCs or pAUCs 
from binormal and bi-exponential distributions. When comparing the AUCs, the optimal 
ratio is close to 1 for a wide range of the correlation parameter values for bivariate nor- 
mal distributions. This implies that equal sampling for two groups yields the maximum 
power for a fixed total required sample size. However, the optimal ratio is around 1.5 for 
bi-exponential distributions, indicating that sampling 50% more in cases than controls 
yields the maximum power to detect a difference between markers. When comparing 
the pAUCs, Figure 1 shows the optimal ratios for bivariate normal distributions. The 
optimal sampling ratio varies from 0.94 to 1.03 when correlation coefficients between 
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two markers vary from —1 to I. Based on these two examples, the mis-specification of 
parametric models at the planning stage may lead to an incorrect optimal ratio. 

Proschan (2004) introduces the concept of internal pilot data which often refers to 
accumulated data after a trial is carried out for a certain period of time. To correct for the 
model mis-specification at the beginning of the trial, we propose a two-stage procedure 
to use internal pilot data after some observations are available during the trial. Suppose 
the total required sample size N is fixed. Without loss of generality, we use a two-sided 
test in the proposed procedure. The procedure is given in the following steps: 

• Step 1: Specify a parametric model to obtain v Xj0 and v y $, and the resulting initial 
optimal ratio, r* = ^/v xfi /v yfi . 

• Step 2: Use the ratio together with v x>0 , v V:0 in the following sample size for- 
mula to calculate initial sample sizes m and n with power 1 — /3 and the signifi- 
cance level a: 

(z a /2 + zp) 2 (v x + r*v y ) 
m = ^2 ' 

and n = N — m , where A 1 is the difference between ROC summary measures 
under the alternative hypothesis. 

• Step 3: After sufficient marker measurements are available on mi cases and rii 
controls at the first stage, the variance expressions of either the A-statistic (Wie- 
and and others, 1989) or DeLong's statistic (DeLong and others, 1988) are re- 
calculated using available data. These variance estimators, t) Xjl and v y ^, are ap- 
plied in (2.3) to re-calculate the optimal ratio, f* = y/v x ,i/v y ,i- 
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Step 4: Continue the trial by recruiting M 2 cases and N 2 controls, where M 2 and 
N 2 are given by 



Nf* N 
M 2 = m 1 and N 2 = (3.2) 

1 + r* 1 + r* 



It is showed in Proschan (2004) that using the internal pilot data for comparing pop- 
ulation means in clinical trials maintains the nominal type I error rate. The reason is that 
the sample variance obtained at the end of the first stage does not give any information 
for the sample mean at the end of the trial. The same relationship between the estimated 
variance and the test statistic is also true for the A-statistic or DeLong's statistic, as 
stated in Proposition 1 . The proof is provided in the Appendix. 

Proposition 1: At the first stage when mi and n\ get large, the variance estimated at the 
first stage does not give any information for the A-statistic or DeLong's statistic at the 
end. 

Proposition 1 shows that estimating variances and the resulting optimal ratio using 
data from the first stage do not reveal information about the estimated difference between 
two ROC statistics obtained at the end of the second stage. Thus, although the optimal 
ratio is updated during the trial, the analysis at the end of the trial can be carried out in 
the same fashion as in the trial without updating the optimal ratio. This is important in 
maintaining the proper type I error rate. 



4. Example 

In this section, we applied our method to a cancer diagnostic trial (Goddard and Hinberg, 
1990). In this study 135 cancer patients and 218 non-cancer patients were recruited. A 



15 



traditional biomarker, A, and newly developed diagnostic biomarkers were used to test 
blood samples from each subject. The unit of measurement was mmol of product per 
minute per millilitre, IU/mm. Measurements are highly skewed for all the methods. We 
compared a new biomarker D and the reference biomarker A to illustrate the power 
increment and the sample size savings by using the proposed procedure. We assumed 
a contrast of A 1 = 0.05 between AUCs and the type I error rate 0.05 for power and 
sample size calculation based on a two-sided alternative. At the first stage, we accrued 
data on mi = 60 cancer and n\ = 60 noncancer patients, and obtained the variance 
estimates, v Xjl = 0.082 and v V:1 = 0.035, which resulted in the optimal case-control 
ratio, f* = 1.53, from (2.3). Let N be the overall sample size, which is 353 by summing 
the numbers of cases and controls. Using this optimal ratio in the expression (3.2) in 
Step 4 of the proposed procedure, the numbers of the cases and controls to be recruited 
in the second stage were calculated to be 153 and 80, respectively. The power using the 
optimal ratio was then 50.9% using the following equation: 

1 - fl = *( Al /jr^j£+<5^)-^)- 

This power offers 7% increment over the power 43.8% calculated using the equation 
above by replacing f* with the original case-control ratio of 0.62. We also investigated 
the savings on the overall sample size by using the proposed procedure. Using the orig- 
inal power 43.8% with the estimated optimal ratio, f * = 1.53, the overall sample size 
was calculated to be to 292 with 177 cancer patients and 115 noncancer patients. This 
offers savings of 61 patients over the original ratio. 
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5. Simulation studies 

In this section, we demonstrated the performance of our method for maximizing power 
when comparing summary statistics of diagnostic tests. We compared the proposed two- 
step procedure with the equal case-control ratio and a fixed case-control ratio under three 
parametric models. Three pairs of AUCs and pAUCs were specified in advance. We 
used DeLong's statistic for comparing the AUCs and the A-statistic for comparing the 
pAUCs. We simulated 5000 observations from bivariate normal (BN), bivariate lognor- 
mal (LN) and bivariate exponential (BE) distributions, respectively. The bivariate nor- 
mal models had the forms of (X u X 2 ) ~ N{{p u p 2 ), £} and {Y u Y 2 ) ~ JV{(0, 0), £}, 
where in the 2 x 2 matrix E, the diagonal elements are l's and off-diagonal elements are 
p. We chose p = 0.1 and p = 0.25 in our simulations. p 1 and p 2 were computed accord- 
ing to three pairs of AUCs, (0.70, 0.75), (0.75, 0.80) and (0.70, 0.80), respectively. For 
comparing the pAUCs with the FPR in the range of (0, 0.6), (pi, p 2 ) were used for three 
pairs of pAUCs, (0.30,0.35), (0.35,0.40) and (0.30,0.40), respectively. The bivariate 
lognormal models had the forms of exp(X 1 , X 2 ) and exp(Y 1 , Y 2 ) for cases and controls, 
respectively. They had the same values of (pi, p 2 ) for the AUCs and pAUCs as above. 
And then, according to the algorithm in Gumbel (1960), the bivariate exponential ran- 
dom variables take the form H(x,y) = ifi(x)if 2 (y)[l + 4p{l - ifi(a;)}{l - H 2 (y)}\, 
where p e [—0.25,0.25]. We set p be 0.1 and 0.25 here. The marginal survival func- 
tions for cases and controls were exp(—(5ax) and exp(—/3 e2 y), so we could generate 
data from these two distributions respectively. In the simulation, we set f3 n = 1 and 
P21 = 1- P12 and j3 22 were computed according to the AUC or pAUC values. For the 
pairs of AUCs (0.70,0.75), (0.75,0.80), and (0.70,0.80), the corresponding (/3 12 , (3 22 ) 
values were (2.333,3.003), (3.003,4.000) and (2.333,4.000). For the pairs of pAUCs 
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(0.30,0.35), (0.35,0.40) and (0.30,0.40), the ((3 12 , P22) values were (1.8957,2.5094), 
(2.5094, 3.3887) and (1.8957, 3.3887), respectively. 

In our simulation, we first assumed that our samples were from bivariate normal 
distributions, then used equation (3.1) to calculate the initial total required sample size. 
With the type I error rate 0.05 and power 80%, the initial total required sample sizes were 
N = 1421, 1200, or 326 to detect the difference of three pairs of AUCs of (0.70, 0.75), 
(0.75,0.80) and (0.70,0.80), respectively, with p = 0.1. When p = 0.25, the total 
required sample sizes, iV = 1207, 1025, or 278, were needed to detect the difference 
in these pairs. For comparing the pAUCs, the initial total required sample sizes were 
N = 1067,979, and 251, for for p = 0.1, and N = 915,842 and 216 for p = 0.25. 
There were three different sampling ratios: 1) the proposed two-stage optimal ratio; 2) 
fixed sample ratio of 0.5; 3) equal sampling ratio. To implement the proposed method, 
we defined the number of available observations at the first stage, m\ — n\ — N/4. 
By substituting nonparametric variance estimates v X:1 and v Vjl , the resulting optimal 
ratio was estimated by r* = y/v x ,i/v y ,i, and M 2 and N 2 were calculated using (3.2). 
We then generated M 2 new observations for cases and N 2 observations for controls. 
Subsequently, the null hypothesis of equal AUCs or pAUCs was rejected in favor of the 
alternative if the Z-statistic calculated using all simulated data was greater than or equal 
to -20.025- The simulated power was then calculated as the percent of times out of 5000 
that the null hypothesis was rejected. The simulated powers for all simulation settings 
are present in Table 1 . 

Table 1 illustrates that larger correlations resulted in higher rejection rates. The sam- 
pling ratio is another factor impacting the power when the alternative hypothesis is true. 
For different underlying distributions, the proposed two- stage method has higher powers 
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than the fixed ratios in most of the settings. 

We also evaluated the performance of the two-step procedure to see whether the 
procedure maintains the nominal type I error rate. We used the total required sample 
sizes, N = 200, 400, or 500. The parametric distributions and three different sampling 
ratios used in the previous simulation were considered. We assumed equal AUCs or 
pAUCs with the AUCs being (0.70, 0.75, 0.80), and the pAUCs being (0.30, 0.35, 0.40). 
The nominal type I error rate was 0.05 in our simulation. The simulated type I error rates 
are shown in Table 2. All these rates are close to the nominal level when the sample size 
goes to 500. 

Variability in the estimators, v Xjl and v Vjl , is associated with the initial sample sizes 
at the first stage. Such variability affects the calculation of the optimal sampling ra- 
tio, which may in turn have an impact on the power in the proposed procedure. We 
conducted another simulation study to investigate the impact of the initial sample size 
selection. We used the total required sample size of 400, and set the initial sample sizes 
of cases and controls to be m = n = 50, 60, 80, or 100. Observations were simulated 
from the binormal distributions with the difference of 0.05 between two AUCs. In each 
simulation, the variance estimators for calculating the optimal ratio were estimated at 
the first stage from three scenarios, namely, 1) a single set of m cases and n controls, 
2) averaging variance estimates of 10 sets of m cases and n controls, and 3) averaging 
variance estimates of 100 sets of m cases and n controls. Results based on 1000 repli- 
cations for each setting are listed in Table 3. It indicates some variations in power for 
the first scenario. When more datasets are involved in the calculation, power becomes 
more stable regardless of the initial sample sizes. More importantly, Table 3 shows that 
the initial sample size selection had little impact on the final power. 
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6. Conclusion 

The optimal sampling ratio in diagnostic trials can maximize the test power or minimize 
the overall sample size. The optimal sampling ratio discussed in this paper is analogous 
to the optimal allocation ratio in assigning patient treatments in clinical trials. The op- 
timal allocation ratio has been used in clinical trials for decades, but the importance of 
the optimal ratio in diagnostic trials has not been widely recognized. Implementation 
requires the calculation of complicated variances of frequently used ROC statistics. This 
paper discusses a common variance structure for ROC statistics, and thereby introduces 
optimal sampling ratios in comparative diagnostic trials based on these statistics. Two 
popular nonparametric ROC statistics are used to illustrate the explicit forms of the op- 
timal ratios because their variance expressions can be written as the sum of separate 
terms; one relates to the cases, and the other relates to the controls. The same variance 
structure is shared by many existing parametric and semiparametric ROC statistics. This 
implies that the optimal ratio form derived in (2.3) is also applicable to these existing 
statistics. 

When marker results follow normal distributions, the optimal sampling ratio is close 
to 1 for many parameter settings. Then sampling the same number of cases and controls 
can potentially achieve the maximal power for a fixed total required sample size. When 
the marker results follow exponential distributions, the sampling ratio is close to 1.5. We 
need to sample more cases than controls to gain power or reduce the overall sample size. 
If preliminary studies are available before carrying out a comparative diagnostic trial, 
the variance can be estimated using pilot data to obtain the optimal ratio for comparing 
specified ROC summary measures. The ratio can then be used to recruit patients in 
the trial, and re-calculating the ratio may not be necessary during the trial. However, 
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when medical practitioners do not have preliminary data for the markers and are not 
certain about the distributions of the marker results, the distribution assumption used 
for obtaining the optimal ratio may be far from the true underlying distributions for the 
marker results. This may result in less power or larger overall sample sizes than using the 
true optimal ratio. The proposed two-stage procedure is then particularly useful to ensure 
that the optimal ratio can be re-calculated using using internal pilot data during the 
trial. The proposed procedure performed well in a large scale simulation study. We also 
demonstrated that the proposed procedure maintains the nominal type I error rate in the 
simulation. We used an example in cancer diagnostic studies to illustrate the application 
of our method on maximizing the test power and saving overall sample sizes. The results 
indicated that compared with the original sampling ratio, using the proposed two-stage 
procedure for a fixed overall sample size increased the test power. Alternatively, for the 
fixed test power, the proposed procedure reduced the overall sample size by nearly 25%. 

It is sometimes desired to minimize the total cost in a diagnostic trial with a limited 
budget. High cost may be associated with diagnostic trials considering using a gold 
standard test to identify the subjects and using markers to diagnosing them. This is 
particularly true in medical imaging diagnostic trials when expensive medical imaging 
devices costing hundreds of dollars for a single session of scans are involved. A case may 
cost more than a control because of higher expenses associated with providing necessary 
medical care when classifying and diagnosing them. We may consider c\ and c 2 as costs 
related to a case and a control, respectively. Usually, c\ and c 2 can be determined by 
medical experts before conducting a trial. Then similar to the derivation in Section 2, the 
optimal sampling ratio for minimizing the total cost is given by r* = y / c 2 v x /c 1 v y for a 
fixed power. This ratio reduces to the one derived in (2.3) when ci = c 2 . An interesting 
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future research topic is to investigate the optimal ratio when the costs are related to the 
true AUC parameters. 
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Appendix 

Appendix: variance derivation and proof of Proposition 1 
Derivation ofv£ and Vy 
We can show that 




can be expressed as 



S d (yi, yi)dS dtl (yi)dS^ 2 (y 2 ). 

-oo 

Let S^(s) = yi and S^(t) = y 2 , then, we have 

/ / [^{^(s), S£(t)}]dsdt = E[I(X U > Y 1 ,)I(X 2t > Y 2l )\. 
Jo Jo 

Similarly, v y becomes 



Vy 



1,1 r rl ^' 

fi{s)rt{t){s A t)dsdt — < / ri(s)sds 

OJO VJo 



E 

e=i 

-2 H 1 n( S )r 2 (t)[S d {S^(s),S^(t)} - st]dsdt 
Jo Jo 



It follows that 



0o r x (8)r 2 {t)Sa{S£{8), S£(t)}dsdt 
= ^° s^AsjHs)} sfjsjl(t)} S d{ S d}( s )> S^{t)}dsdt. 

Let Sjl(s) = yi and Sj\(t) = y 2 , then it follows that 

J J S' d<1 {yi}S' d 2 {y 2 }Sa(yi,y2))dy 1 dy 2 

=E[I(X U < Y lj )I{X 2k < Y 2j )\ 

=E[(1 - I(X U > Y lj ))(l - I(X 2k > Y 2j ))\ 

=1 - E(I(X U > Y lj )) - E(I(X 2k > Y 2j )) + E[I(X li > Y lj )I(X 2k > 
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Because 

»1 /■! 



nri(s)r 2 (t)stdsdt 



can also be written as 



1 - Pr(X u > Y Xj ) - Pr(X 2fc > y 2j ) + £[/(X H > Y^Sf/p^ > F 2 ,)], 
the expressions for v x and v y are simplified as follows: 



2 



]T > Y tj )I(X ti > Y a )\ - [E(I(X U > Y tj ))f) 



-2{E[I{X u >Y lj )I{X 2i >Y 2l )]-E[I{X u >Y lj )]E[I{X 2i >Y 2l )\) , (A.l) 



and 



e=i 

-2(E[I(X lt >Y 1 ,)I(X 2k >Y 23 )]-E[I(X lt >Y 1 ,)]E[I(X 2k >Y 2] )]) . (A.2) 

□ 

Proof of Proposition 1 

We first prove that the proposition is true for the A-statistic. Similar arguments can 
then be used for the Delong's statistic. Let w = ^ - . We see that in (4), Wi's are i.i.d. 
random variables independent of i.i.d. random variables vfs. It then follows that 

COV — , Wi—W =COV — , Wi —cov — — , w 

\ m 1 + m 2 ) \ m 1 + m 2 J \ m x + m 2 I 
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equals 0, and 



»1 i mi -i 

■D=/ ^(^(u))^ V/ I(X u ^S^(u))d 

Jo ' m i ^ Jo 

1 m l pi 
+ — E / J ( X ^ < SjJ 



(u))du- I S d:2 (S d2 (u))du. 
o 



We then get 

(^_ Uj ) = _V/ HXu^SjHufidu- I(Xu^S,}(u))du 
mi^Jo ' Jo 

j(x 2i ^ s;g( u ))d u - — E / 7 ( x * ^ ^M)^- 

Therefore, 

— ^ « ^/mi, 

mi — 1 

which indicates that for large sample sizes, (X]I=i +m2 + %) is independent 

of v Xj i/mi. Similarly, we get that for large sample sizes, (X^lt™ 2 v j)/( n i + n 2) is 
independent of v y ^jn\. □ 
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Fig. 1. Optimal sampling ratio for comparing pAUCs. The observations are from two bivariate normal distributions. 
The FPR is between and 0.6. 
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Table 1 . Simulated power (in %) for comparing AUCs or pAUCs 



Comparing AUCs using the DeLong's method 







Two-Stage 


T — 1 


T — u.o 


p 




0.70 


AR 0.75 


AR 


0.70 0.75 


0.70 0.75 




BN 


0.75 


79.3 


1.001 - 




79.7 - 


74.6 - 






0.80 


81.0 


1.003 79.1 


1.002 


78.8 80.5 


75.3 74.8 


0.1 


LN 


0.75 


80.5 


1.001 - 




80.2 - 


75.6 - 






0.80 


80.3 


1.003 79.4 


1.000 


79.8 80.6 


75.3 75.2 




BE 


0.75 


81.0 


1.340 - 




80.4 - 


71.2 - 






0.80 


81.6 


1.467 81.8 


1.551 


80.0 80.4 


70.0 69.9 






0.70 


AR 0.75 


AR 


0.70 0.75 


0.70 0.75 






0.75 


79.8 


1.002 - 




80.0 - 


74.9 - 






0.80 


80.2 


1.007 80.3 


1.004 


79.1 80.2 


75.1 74.5 


0.25 


LiV 


0.75 


79.8 


1.002 - 




79.8 - 


75.3 - 






0.80 


80.2 


1.005 79.8 


1.003 


79.7 79.6 


74.9 75.3 






0.75 


83.7 


1.412 - 




82.6 - 


74.2 - 






0.80 


83.6 


1.482 83.5 


1.579 


82.8 81.0 


72.5 71.4 


Comparing pAUCs using the A-statistic 






Two-Stage 


r = 1 


r = 0.5 


P 




0.30 


AR 0.35 


AR 


0.30 0.35 


0.30 0.35 




BN 


0.35 


79.2 0.952 - 




78.7 - 


73.9 - 






0.40 


79.8 


1.008 79.3 0.954 


80.6 79.2 


75.1 75.0 


0.1 


LN 


0.35 


79.0 0.953 - 




78.9 - 


74.8 - 






0.40 


80.4 


1.003 79.3 


0.954 


81.0 80.6 


75.8 74.8 




BE 


0.35 


84.6 


1.249 - 




84.0 - 


76.6 - 






0.40 


84.9 


1.389 83.6 


1.386 


84.6 83.9 


76.6 74.4 






0.30 


AR 0.35 


AR 


0.30 0.35 


0.30 0.35 




BN 


0.35 


78.7 0.947 - 




78.0 - 


75.4 - 






0.40 


80.6 


1.014 78.7 


0.952 


80.5 79.8 


76.0 75.2 


0.25 


LN 


0.35 


79.2 0.949 - 




79.0 - 


74.6 - 






0.40 


80.9 


1.013 79.3 


0.950 


80.7 79.2 


76.6 75.3 




BE 


0.35 


86.9 


1.219 - 




87.1 - 


80.0 - 






0.40 


86.8 


1.385 84.5 


1.365 


86.6 84.0 


78.6 76.9 



AR - the average ratio, BN - bivariate normal, LN - bivariate lognormal, BE - bi- 
variate exponential, Vt A - the AUC for marker 1 , Vt A - the AUC for marker 2, f2f A - the 
pAUC for marker 1, f2^ A - the pAUC for marker 2, p - the correlation coefficient of two 
markers. 
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Table 2. Type I error rates (in %) for comparing the AUCs or pAUCs 



Comparing the AUCs Comparing the pAUCs 



p 




AUCS 


IN— zuu 


400 500 


„ AT T/~V 


iN— ZUU 


400 500 




rSiN 


u. /u 


4.J 


c o 

J.O 


C A 
J.U 


U.jU 


J.O 


c n 
j.y 


4.0 






U. / J 


J. 4 


c o 

J.O 


A A 


U.J J 


o.o 


c o 

J.O 


J.U 






U.oU 


'J 


< 1 

J . 1 




n /in 
U.4U 


o. / 


J .J 


fs 1 
U. 1 




T N 


70 


5 7 


5.3 


5.0 


30 


6 8 


5.9 


4.6 


0.10 




0.75 


5.4 


6.5 


4.3 


0.35 


6.8 


5.8 


5.0 






0.80 


5.3 


4.4 


5.9 


0.40 


6.7 


5.5 


6.1 




BE 


0.70 


5.4 


5.1 


4.4 


0.30 


6.2 


5.4 


5.5 






0.75 


5.3 


6.6 


4.0 


0.35 


7.7 


7.4 


5.6 






0.80 


5.2 


5.1 


4.3 


0.40 


7.5 


7.3 


5.0 




BN 


0.70 


4.9 


4.8 


4.7 


0.30 


5.9 


5.3 


5.4 






0.75 


5.9 


5.4 


5.9 


0.35 


5.2 


5.3 


5.2 






0.80 


5.5 


6.1 


5.6 


0.40 


5.6 


5.8 


5.1 


0.25 


LN 


0.70 


5.2 


5.3 


5.5 


0.30 


6.0 


5.3 


5.4 






0.75 


5.9 


5.4 


5.9 


0.35 


5.2 


5.3 


5.2 






0.80 


5.8 


3.9 


4.3 


0.40 


6.7 


5.8 


5.4 




BE 


0.70 


4.2 


5.0 


3.9 


0.30 


5.0 


5.6 


4.8 






0.75 


5.3 


5.7 


4.4 


0.35 


5.7 


7.0 


6.4 






0.80 


5.2 


5.1 


4.3 


0.40 


6.8 


6.8 


6.4 



BN - bivariate normal, LN - bivariate lognormal, BE - bivariate exponential, iV - 
total required sample size, p - the correlation coefficient of two markers 



Table 3. Power comparison 







m (n ) 


K 




50 60 80 100 


1 


Ratio 
Power (%) 


1.02 1.01 0.87 1.07 
31.9 26.6 32.6 33.7 


10 


Ratio 
Power (%) 


1.03 0.94 1.03 1.05 
29.9 31.3 31.9 32.5 


100 


Ratio 
Power (%) 


1.03 1.02 1.00 1.03 
31.1 30.8 31.1 31.4 



K - the number of datasets simulated to estimate variances. 



Power size with various pA 




Neyman Alloation Ratio for Comparing AUCs 




a) Bivariate Biexponential Variables 




b) Bivariate Binormal Variables 



